The search functionality is under construction.

Keyword Search Result

[Keyword] feature selection(62hit)

41-60hit(62hit)

  • Rough-Mutual Feature Selection Based on Min-Uncertainty and Max-Certainty

    Sombut FOITONG  Ouen PINNGERN  Boonwat ATTACHOO  

     
    PAPER

      Vol:
    E95-D No:4
      Page(s):
    970-981

    Feature selection (FS) plays an important role in pattern recognition and machine learning. FS is applied to dimensionality reduction and its purpose is to select a subset of the original features of a data set which is rich in the most useful information. Most existing FS methods based on rough set theory focus on dependency function, which is based on lower approximation as for evaluating the goodness of a feature subset. However, by determining only information from a positive region but neglecting a boundary region, most relevant information could be invisible. This paper, the maximal lower approximation (Max-Certainty) – minimal boundary region (Min-Uncertainty) criterion, focuses on feature selection methods based on rough set and mutual information which use different values among the lower approximation information and the information contained in the boundary region. The use of this idea can result in higher predictive accuracy than those obtained using the measure based on the positive region (certainty region) alone. This demonstrates that much valuable information can be extracted by using this idea. Experimental results are illustrated for discrete, continuous, and microarray data and compared with other FS methods in terms of subset size and classification accuracy.

  • An Informative Feature Selection Method for Music Genre Classification

    Jin Soo SEO  

     
    LETTER-Music Information Processing

      Vol:
    E94-D No:6
      Page(s):
    1362-1365

    This letter presents a new automatic musical genre classification method based on an informative song-level representation, in which the mutual information between the feature and the genre label is maximized. By efficiently combining distance-based indexing with informative features, the proposed method represents a song as one vector instead of complex statistical models. Experiments on an audio genre DB show that the proposed method can achieve the classification accuracy comparable or superior to the state-of-the-art results.

  • Discrimination between Upstairs and Downstairs Based on Accelerometer

    Yang XUE  Lianwen JIN  

     
    LETTER

      Vol:
    E94-D No:6
      Page(s):
    1173-1177

    An algorithm for the discrimination between human upstairs and downstairs using a tri-axial accelerometer is presented in this paper, which consists of vertical acceleration calibration, extraction of two kinds of features (Interquartile Range and Wavelet Energy), effective feature subset selection with the wrapper approach, and SVM classification. The proposed algorithm can recognize upstairs and downstairs with 95.64% average accuracy for different sensor locations, i.e. located on the subject's waist belt, in the trousers pocket, and in the shirt pocket. Even for the mixed data from all sensor locations, the average recognition accuracy can reach 94.84%. Experimental results have successfully validated the effectiveness of the proposed method.

  • Improved Gini-Index Algorithm to Correct Feature-Selection Bias in Text Classification

    Heum PARK  Hyuk-Chul KWON  

     
    PAPER-Pattern Recognition

      Vol:
    E94-D No:4
      Page(s):
    855-865

    This paper presents an improved Gini-Index algorithm to correct feature-selection bias in text classification. Gini-Index has been used as a split measure for choosing the most appropriate splitting attribute in decision tree. Recently, an improved Gini-Index algorithm for feature selection, designed for text categorization and based on Gini-Index theory, was introduced, and it has proved to be better than the other methods. However, we found that the Gini-Index still shows a feature selection bias in text classification, specifically for unbalanced datasets having a huge number of features. The feature selection bias of the Gini-Index in feature selection is shown in three ways: 1) the Gini values of low-frequency features are low (on purity measure) overall, irrespective of the distribution of features among classes, 2) for high-frequency features, the Gini values are always relatively high and 3) for specific features belonging to large classes, the Gini values are relatively lower than those belonging to small classes. Therefore, to correct that bias and improve feature selection in text classification using Gini-Index, we propose an improved Gini-Index (I-GI) algorithm with three reformulated Gini-Index expressions. In the present study, we used global dimensionality reduction (DR) and local DR to measure the goodness of features in feature selections. In experimental results for the I-GI algorithm, we obtained unbiased feature values and eliminated many irrelevant general features while retaining many specific features. Furthermore, we could improve the overall classification performances when we used the local DR method. The total averages of the classification performance were increased by 19.4 %, 15.9 %, 3.3 %, 2.8 % and 2.9 % (kNN) in Micro-F1, 14 %, 9.8 %, 9.2 %, 3.5 % and 4.3 % (SVM) in Micro-F1, 20 %, 16.9 %, 2.8 %, 3.6 % and 3.1 % (kNN) in Macro-F1, 16.3 %, 14 %, 7.1 %, 4.4 %, 6.3 % (SVM) in Macro-F1, compared with tf*idf, χ2, Information Gain, Odds Ratio and the existing Gini-Index methods according to each classifier.

  • MicroRNA Expression Profiles for Classification and Analysis of Tumor Samples

    Dang Hung TRAN  Tu Bao HO  Tho Hoan PHAM  Kenji SATOU  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    416-422

    One kind of functional noncoding RNAs, microRNAs (miRNAs), form a class of endogenous RNAs that can have important regulatory roles in animals and plants by targeting transcripts for cleavage or translation repression. Researches on both experimental and computational approaches have shown that miRNAs indeed involve in the human cancer development and progression. However, the miRNAs that contribute more information to the distinction between the normal and tumor samples (tissues) are still undetermined. Recently, the high-throughput microarray technology was used as a powerful technique to measure the expression level of miRNAs in cells. Analyzing this expression data can allow us to determine the functional roles of miRNAs in the living cells. In this paper, we present a computational method to (1) predicting the tumor tissues using high-throughput miRNA expression profiles; (2) finding the informative miRNAs that show strong distinction of expression level in tumor tissues. To this end, we perform a support vector machine (SVM) based method to deeply examine one recent miRNA expression dataset. The experimental results show that SVM-based method outperforms other supervised learning methods such as decision trees, Bayesian networks, and backpropagation neural networks. Furthermore, by using the miRNA-target information and Gene Ontology annotations, we showed that the informative miRNAs have strong evidences related to some types of human cancer including breast, lung, and colon cancer.

  • Extraction of Combined Features from Global/Local Statistics of Visual Words Using Relevant Operations

    Tetsu MATSUKAWA  Takio KURITA  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E93-D No:10
      Page(s):
    2870-2874

    This paper presents a combined feature extraction method to improve the performance of bag-of-features image classification. We apply 10 relevant operations to global/local statistics of visual words. Because the pairwise combination of visual words is large, we apply feature selection methods including fisher discriminant criterion and L1-SVM. The effectiveness of the proposed method is confirmed through the experiment.

  • How the Number of Interest Points Affect Scene Classification

    Wenjie XIE  De XU  Shuoyan LIU  Yingjun TANG  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E93-D No:4
      Page(s):
    930-933

    This paper focuses on the relationship between the number of interest points and the accuracy rate in scene classification. Here, we accept the common belief that more interest points can generate higher accuracy. But, few effort have been done in this field. In order to validate this viewpoint, in our paper, extensive experiments based on bag of words method are implemented. In particular, three different SIFT descriptors and five feature selection methods are adopted to change the number of interest points. As innovation point, we propose a novel dense SIFT descriptor named Octave Dense SIFT, which can generate more interest points and higher accuracy, and a new feature selection method called number mutual information (NMI), which has better robustness than other feature selection methods. Experimental results show that the number of interest points can aggressively affect classification accuracy.

  • Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification

    Heum PARK  Hyuk-Chul KWON  

     
    PAPER-Document Analysis

      Vol:
    E92-D No:12
      Page(s):
    2360-2368

    This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.

  • A GMM-Based Feature Selection Algorithm for Multi-Class Classification

    Tacksung CHOI  Sunkuk MOON  Young-cheol PARK  Dae-hee YOUN  Seokpil LEE  

     
    LETTER-Pattern Recognition

      Vol:
    E92-D No:8
      Page(s):
    1584-1587

    In this paper, we propose a new feature selection algorithm for multi-class classification. The proposed algorithm is based on Gaussian mixture models (GMMs) of the features, and it uses the distance between the two least separable classes as a metric for feature selection. The proposed system was tested with a support vector machine (SVM) for multi-class classification of music. Results show that the proposed feature selection scheme is superior to conventional schemes.

  • Differentiating Honeycombed Images from Normal HRCT Lung Images

    Aamir Saeed MALIK  Tae-Sun CHOI  

     
    LETTER-Biological Engineering

      Vol:
    E92-D No:5
      Page(s):
    1218-1221

    A classification method is presented for differentiating honeycombed High Resolution Computed Tomographic (HRCT) images from normal HRCT images. For successful classification of honeycombed HRCT images, a complete set of methods and algorithms is described from segmentation to extraction to feature selection to classification. Wavelet energy is selected as a feature for classification using K-means clustering. Test data of 20 patients are used to validate the method.

  • A Filter Method for Feature Selection for SELDI-TOF Mass Spectrum

    Trung-Nghia VU  Syng-Yup OHN  

     
    LETTER-Pattern Recognition

      Vol:
    E92-D No:2
      Page(s):
    346-348

    We propose a new filter method for feature selection for SELDI-TOF mass spectrum datasets. In the method, a new relevance index was defined to represent the goodness of a feature by considering the distribution of samples based on the counts. The relevance index can be used to obtain the feature sets for classification. Our method can be applied to mass spectrum datasets with extremely high dimensions and process the clinical datasets with practical sizes in acceptable calculation time since it is based on simple counting of samples. The new method was applied to the three public mass spectrum datasets and showed better or comparable results than conventional filter methods.

  • Modeling Network Intrusion Detection System Using Feature Selection and Parameters Optimization

    Dong Seong KIM  Jong Sou PARK  

     
    PAPER-Application Information Security

      Vol:
    E91-D No:4
      Page(s):
    1050-1057

    Previous approaches for modeling Intrusion Detection System (IDS) have been on twofold: improving detection model(s) in terms of (i) feature selection of audit data through wrapper and filter methods and (ii) parameters optimization of detection model design, based on classification, clustering algorithms, etc. In this paper, we present three approaches to model IDS in the context of feature selection and parameters optimization: First, we present Fusion of Genetic Algorithm (GA) and Support Vector Machines (SVM) (FuGAS), which employs combinations of GA and SVM through genetic operation and it is capable of building an optimal detection model with only selected important features and optimal parameters value. Second, we present Correlation-based Hybrid Feature Selection (CoHyFS), which utilizes a filter method in conjunction of GA for feature selection in order to reduce long training time. Third, we present Simultaneous Intrinsic Model Identification (SIMI), which adopts Random Forest (RF) and shows better intrusion detection rates and feature selection results, along with no additional computational overheads. We show the experimental results and analysis of three approaches on KDD 1999 intrusion detection datasets.

  • A Learning Algorithm of Boosting Kernel Discriminant Analysis for Pattern Recognition

    Shinji KITA  Seiichi OZAWA  Satoshi MAEKAWA  Shigeo ABE  

     
    PAPER-Biocybernetics, Neurocomputing

      Vol:
    E90-D No:11
      Page(s):
    1853-1863

    In this paper, we present a new method to enhance classification performance of a multiple classifier system by combining a boosting technique called AdaBoost.M2 and Kernel Discriminant Analysis (KDA). To reduce the dependency between classifier outputs and to speed up the learning, each classifier is trained in a different feature space, which is obtained by applying KDA to a small set of hard-to-classify training samples. The training of the system is conducted based on AdaBoost.M2, and the classifiers are implemented by Radial Basis Function networks. To perform KDA at every boosting round in a realistic time scale, a new kernel selection method based on the class separability measure is proposed. Furthermore, a new criterion of the training convergence is also proposed to acquire good classification performance with fewer boosting rounds. To evaluate the proposed method, several experiments are carried out using standard evaluation datasets. The experimental results demonstrate that the proposed method can select an optimal kernel parameter more efficiently than the conventional cross-validation method, and that the training of boosting classifiers is terminated with a fairly small number of rounds to attain good classification accuracy. For multi-class classification problems, the proposed method outperforms both Boosting Linear Discriminant Analysis (BLDA) and Radial-Basis Function Network (RBFN) with regard to the classification accuracy. On the other hand, the performance evaluation for 2-class problems shows that the advantage of the proposed BKDA against BLDA and RBFN depends on the datasets.

  • Kernel Trees for Support Vector Machines

    Ithipan METHASATE  Thanaruk THEERAMUNKONG  

     
    PAPER

      Vol:
    E90-D No:10
      Page(s):
    1550-1556

    The support vector machines (SVMs) are one of the most effective classification techniques in several knowledge discovery and data mining applications. However, a SVM requires the user to set the form of its kernel function and parameters in the function, both of which directly affect to the performance of the classifier. This paper proposes a novel method, named a kernel-tree, the function of which is composed of multiple kernels in the form of a tree structure. The optimal kernel tree structure and its parameters is determined by genetic programming (GP). To perform a fine setting of kernel parameters, the gradient descent method is used. To evaluate the proposed method, benchmark datasets from UCI and dataset of text classification are applied. The result indicates that the method can find a better optimal solution than the grid search and the gradient search.

  • Feature Selection in Genetic Fuzzy Discretization for the Pattern Classification Problems

    Yoon-Seok CHOI  Byung-Ro MOON  

     
    PAPER-Pattern Recognition

      Vol:
    E90-D No:7
      Page(s):
    1047-1054

    We propose a new genetic fuzzy discretization method with feature selection for the pattern classification problems. Traditional discretization methods categorize a continuous attribute into a number of bins. Because they are made on crisp discretization, there exists considerable information loss. Fuzzy discretization allows overlapping intervals and reflects linguistic classification. However, the number of intervals, the boundaries of intervals, and the degrees of overlapping are intractable to get optimized and a discretization process increases the total amount of data being transformed. We use a genetic algorithm with feature selection not only to optimize these parameters but also to reduce the amount of transformed data by filtering the unconcerned attributes. Experimental results showed considerable improvement on the classification accuracy over a crisp discretization and a typical fuzzy discretization with feature selection.

  • A Novel Feature Selection for Fuzzy Neural Networks for Personalized Facial Expression Recognition

    Dae-Jin KIM  Zeungnam BIEN  

     
    PAPER

      Vol:
    E87-A No:6
      Page(s):
    1386-1392

    This paper proposes a novel feature selection method for the fuzzy neural networks and presents an application example for 'personalized' facial expression recognition. The proposed method is shown to result in a superior performance than many existing approaches.

  • Two Step POS Selection for SVM Based Text Categorization

    Takeshi MASUYAMA  Hiroshi NAKAGAWA  

     
    PAPER

      Vol:
    E87-D No:2
      Page(s):
    373-379

    Although many researchers have verified the superiority of Support Vector Machine (SVM) on text categorization tasks, some recent papers have reported much lower performance of SVM based text categorization methods when focusing on all types of parts of speech (POS) as input words and treating large numbers of training documents. This was caused by the overfitting problem that SVM sometimes selected unsuitable support vectors for each category in the training set. To avoid the overfitting problem, we propose a two step text categorization method with a variable cascaded feature selection (VCFS) using SVM. VCFS method selects a pair of the best number of words and the best POS combination for each category at each step of the cascade. We made use of the difference of words with the highest mutual information for each category on each POS combination. Through the experiments, we confirmed the validation of VCFS method compared with other SVM based text categorization methods, since our results showed that the macro-averaged F1 measure (64.8%) of VCFS method was significantly better than any reported F1 measures, though the micro-averaged F1 measure (85.4%) of VCFS method was similar to them.

  • Fractal Neural Network Feature Selector for Automatic Pattern Recognition System

    Basabi CHAKRABORTY  Yasuji SAWADA  

     
    PAPER

      Vol:
    E82-A No:9
      Page(s):
    1845-1850

    Feature selection is an integral part of any pattern recognition system. Removal of redundant features improves the efficiency of a classifier as well as cut down the cost of future feature extraction. Recently neural network classifiers have become extremely popular compared to their counterparts from statistical theory. Some works on the use of artificial neural network as a feature selector have already been reported. In this work a simple feature selection algorithm has been proposed in which a fractal neural network, a modified version of multilayer perceptron, has been used as a feature selector. Experiments have been done with IRIS and SONAR data set by simulation. Results suggest that the algorithm with the fractal network architecture works well for removal of redundant informations as tested by classification rate. The fractal neural network takes lesser training time than the conventional multilayer perceptron for its lower connectivity while its performance is comparable to the multilayer perceptron. The ease of hardware implementation is also an attractive point in designing feature selector with fractal neural network.

  • Texture Segmentation Using Separable and Non-Separable Wavelet Frames

    Jeng-Shyang PAN  Jing-Wein WANG  

     
    PAPER

      Vol:
    E82-A No:8
      Page(s):
    1463-1474

    In this paper, a new feature which is characterized by the extrema density of 2-D wavelet frames estimated at the output of the corresponding filter bank is proposed for texture segmentation. With and without feature selection, the discrimination ability of features based on pyramidal and tree-structured decompositions are comparatively studied using the extrema density, energy, and entropy as features, respectively. These comparisons are demonstrated with separable and non-separable wavelets. With the three-, four-, and five-category textured images from Brodatz album, it is observed that most performances with feature selection improve significantly than those without feature selection. In addition, the experimental results show that the extrema density-based measure performs best among the three types of features investigated. A Min-Min method based on genetic algorithms, which is a novel approach with the spatial separation criterion (SPC) as the evaluation function is presented to evaluate the segmentation performance of each subset of selected features. In this work, the SPC is defined as the Euclidean distance within class divided by the Euclidean distance between classes in the spatial domain. It is shown that with feature selection the tree-structured wavelet decomposition based on non-separable wavelet frames has better performances than the tree-structured wavelet decomposition based on separable wavelet frames and pyramidal decomposition based on separable and non-separable wavelet frames in the experiments. Finally, we compare to the segmentation results evaluated with the templates of the textured images and verify the effectiveness of the proposed criterion. Moreover, it is proved that the discriminatory characteristics of features do spread over all subbands from the feature selection vector.

  • Genetic Feature Selection for Texture Classification Using 2-D Non-Separable Wavelet Bases

    Jing-Wein WANG  Chin-Hsing CHEN  Jeng-Shyang PAN  

     
    PAPER

      Vol:
    E81-A No:8
      Page(s):
    1635-1644

    In this paper, the performances of texture classification based on pyramidal and uniform decomposition are comparatively studied with and without feature selection. This comparison using the subband variance as feature explores the dependence among features. It is shown that the main problem when employing 2-D non-separable wavelet transforms for texture classification is the determination of the suitable features that yields the best classification results. A Max-Max algorithm which is a novel evaluation function based on genetic algorithms is presented to evaluate the classification performance of each subset of selected features. It is shown that the performance with feature selection in which only about half of features are selected is comparable to that without feature selection. Moreover, the discriminatory characteristics of texture spread more in low-pass bands and the features extracted from the pyramidal decomposition are more representative than those from the uniform decomposition. Experimental results have verified the selectivity of the proposed approach and its texture capturing characteristics.

41-60hit(62hit)